Search Result

Journals

Publication Years

Keywords

Please wait a minute...

For Selected:

Download Citations
EndNote Ris BibTeX

Toggle Thumbnails

Select

Parallel optimization sampling clustering K-means algorithm for big data processing

ZHOU Runwu, LI Zhiyong, CHEN Shaomiao, CHEN Jing, LI Renfa

Journal of Computer Applications 2016, 36 (2): 311-315. DOI: 10.11772/j.issn.1001-9081.2016.02.0311

Abstract （613）

PDF （883KB）（1531）

Save

Focusing on the low accuracy and slow convergence of K-means clustering algorithm, an improved K-means algorithm based on optimization sample clustering named OSCK (Optimization Sampling Clustering K-means Algorithm) was proposed. Firstly, multiple samples were obtained from mass data by probability sampling. Secondly, based on Euclidean distance similarity principle of optimal clustering center, the results of sample clustering were modeled and evaluated, and the sub-optimal solution of sample clustering results was removed. Finally, the final k clustering centers were got by weighted integration evaluation of clustering results, and the final k clustering centers were used as cluster centers of big data set. Theoretical analysis and experimental results show that the proposed method for mass data analysis with respect to the comparison algorithm has better clustering accuracy, and has strong robustness and scalability.

Reference | Related Articles | Metrics